Goto

Collaborating Authors

 rational activation function



Supplementary Material of Rational neural networks

Neural Information Processing Systems

Finally, we use the identity ReLU( x) = |x | + x 2, x R, to define a rational approximation to the ReLU function on the interval [ 1, 1] as r (x) = 1 2 null xr ( x) 1 + null + x null . Therefore, we have the following inequalities for x [ 1, 1], | ReLU( x) r (x) | = 1 2 null null null null | x| xr ( x) 1 + null null null null null 1 2(1 + null) (||x | xr (x) | + null| x |) null 1 + null null. We now show that ReLU neural networks can approximate rational functions. The structure of the proof closely follows [12, Lemma 1.3]. The statement of Theorem 3 comes in two parts, and we prove them separately.



Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning

Surdej, Rafał, Bortkiewicz, Michał, Lewandowski, Alex, Ostaszewski, Mateusz, Lyle, Clare

arXiv.org Artificial Intelligence

Trainable activation functions, whose parameters are optimized alongside network weights, offer increased expressivity compared to fixed activation functions. Specifically, trainable activation functions defined as ratios of polynomials (rational functions) have been proposed to enhance plasticity in reinforcement learning. However, their impact on training stability remains unclear. In this work, we study trainable rational activations in both reinforcement and continual learning settings. We find that while their flexibility enhances adaptability, it can also introduce instability, leading to overestimation in RL and feature collapse in longer continual learning scenarios. Our main result is demonstrating a trade-off between expressivity and plasticity in rational activations. To address this, we propose a constrained variant that structurally limits excessive output scaling while preserving adaptability. Experiments across MetaWorld and DeepMind Control Suite (DMC) environments show that our approach improves training stability and performance. In continual learning benchmarks, including MNIST with reshuffled labels and Split CIFAR-100, we reveal how different constraints affect the balance between expressivity and long-term retention. While preliminary experiments in discrete action domains (e.g., Atari) did not show similar instability, this suggests that the trade-off is particularly relevant for continuous control. Together, our findings provide actionable design principles for robust and adaptable trainable activations in dynamic, non-stationary environments. Figure 1: Interquartile Mean (IQM) performance after 1M environment steps, aggregated across 15 MetaWorld and 15 DeepMind Control Suite (DMC) environments. For Meta-World, we measure the score, while for DMC, returns are divided by 1000 to match the upper performance bound. We compare Original Rationals (OR), our Constrained Rationals (CR), ReLU, and ReLU with Layer Normalization (LN), all trained with resets. Our results show that CR + Resets achieves the highest overall performance, highlighting the benefits of our proposed constraints in stabilizing RL training. Neural network expressivity is a key factor in reinforcement learning (RL), particularly in dynamic environments where agents must continuously adapt. While most RL architectures rely on static activation functions, recent work suggests that allowing activations to be trainable could enhance adaptability by increasing the flexibility of individual neurons.


Adaptive Rational Activations to Boost Deep Reinforcement Learning

Delfosse, Quentin, Schramowski, Patrick, Mundt, Martin, Molina, Alejandro, Kersting, Kristian

arXiv.org Artificial Intelligence

Latest insights from biology show that intelligence not only emerges from the connections between neurons, but that individual neurons shoulder more computational responsibility than previously anticipated. Specifically, neural plasticity should be critical in the context of constantly changing reinforcement learning (RL) environments, yet current approaches still primarily employ static activation functions. In this work, we motivate the use of adaptable activation functions in RL and show that rational activation functions are particularly suitable for augmenting plasticity. Inspired by residual networks, we derive a condition under which rational units are closed under residual connections and formulate a naturally regularised version. The proposed joint-rational activation allows for desirable degrees of flexibility, yet regularises plasticity to an extent that avoids overfitting by leveraging a mutual set of activation function parameters across layers. We demonstrate that equipping popular algorithms with (joint) rational activations leads to consistent improvements on different games from the Atari Learning Environment benchmark, notably making DQN competitive to DDQN and Rainbow. Neural Networks' efficiency in approximating any function has made them the default choice in many machine learning tasks. This is no different in deep reinforcement learning (RL), where the DQN algorithm's introduction (Mnih et al., 2015) has sparked the development of various neural solutions. In concurrence with former neuroscientific explanations of brainpower residing in combinations stemming from trillions of connections (Garlick, 2002), present advances have emphasised the role of the neural architecture (Liu et al., 2018; Xie et al., 2019). However, research has also progressively shown that individual neurons shoulder more complexity than initially expected, with the latest results demonstrating that dendritic compartments can compute complex functions (e.g. This finding seems to have renewed interest in activation functions (Georgescu et al., 2020; Misra, 2020). In fact, many functions have been adopted across different domains (Redmon et al., 2016; Brown et al., 2020; Schulman et al., 2017). To reduce the bias introduced by a fixed activation function and achieve higher expressive power, one can further learn which activation function is performant for a particular task (Zoph & Le, 2017; Liu et al., 2018), learn to combine arbitrary families of activation functions (Manessi & Rozza, 2018), or find coefficients for polynomial activations as weights to be optimised (Goyal et al., 2019). Figure 1: Neural plasticity due to trainable activation functions allows RL agents to adapt to environments of increasing complexity. Rational activations (bottom), with shared parameters in each of the last two layers, evolve together with their input distributions (shaded blue) when learning with DQN on Time Pilot.


Rational Neural Network Controllers

Newton, Matthew, Papachristodoulou, Antonis

arXiv.org Artificial Intelligence

Neural networks have shown great success in many machine learning related tasks, due to their ability to act as general function approximators. Recent work has demonstrated the effectiveness of neural networks in control systems (known as neural feedback loops), most notably by using a neural network as a controller. However, one of the big challenges of this approach is that neural networks have been shown to be sensitive to adversarial attacks. This means that, unless they are designed properly, they are not an ideal candidate for controllers due to issues with robustness and uncertainty, which are pivotal aspects of control systems. There has been initial work on robustness to both analyse and design dynamical systems with neural network controllers. However, one prominent issue with these methods is that they use existing neural network architectures tailored for traditional machine learning tasks. These structures may not be appropriate for neural network controllers and it is important to consider alternative architectures. This paper considers rational neural networks and presents novel rational activation functions, which can be used effectively in robustness problems for neural feedback loops. Rational activation functions are replaced by a general rational neural network structure, which is convex in the neural network's parameters. A method is proposed to recover a stabilising controller from a Sum of Squares feasibility test. This approach is then applied to a refined rational neural network which is more compatible with Sum of Squares programming. Numerical examples show that this method can successfully recover stabilising rational neural network controllers for neural feedback loops with non-linear plants with noise and parametric uncertainty.


Rational neural networks

Boullé, Nicolas, Nakatsukasa, Yuji, Townsend, Alex

arXiv.org Machine Learning

We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more efficiently than ReLU networks. The flexibility and smoothness of rational activation functions make them an attractive alternative to ReLU, as we demonstrate with numerical experiments.